The goal of this project is to explore some locational data with the folium library. Let's start off by importing folium for map creation and pandas for parsing a csv and creating a database.

In [1]:
import folium
import pandas as pd

For this project, I have chosen a dataset containing the vehicle crash data for the state of Maryland. Most of the data is centered around Baltimore, so we will focus on that region for now.

Now let's parse the file and create a database of crash data.

In [3]:
# Low memory is set to false to avoid warnings
crash_data = pd.read_csv("https://cmsc320.github.io/files/Baltimore_City_Vehicle_Crashes.csv", low_memory = False)
print(crash_data.columns)
crash_data
Index(['YEAR', 'QUARTER', 'LIGHT_DESC', 'LIGHT_CODE', 'COUNTY_NO', 'MUNI_DESC',
       'MUNI_CODE', 'JUNCTION_DESC', 'JUNCTION_CODE', 'COLLISION_TYPE_DESC',
       'COLLISION_TYPE_CODE', 'SURF_COND_DESC', 'SURF_COND_CODE', 'LANE_DESC',
       'LANE_CODE', 'RD_COND_DESC', 'RD_COND_CODE', 'RD_DIV_DESC',
       'RD_DIV_CODE', 'FIX_OBJ_DESC', 'FIX_OBJ_CODE', 'REPORT_NO',
       'REPORT_TYPE', 'WEATHER_DESC', 'WEATHER_CODE', 'ACC_DATE', 'ACC_TIME',
       'LOC_CODE', 'SIGNAL_FLAG_DESC', 'SIGNAL_FLAG', 'C_M_ZONE_FLAG',
       'AGENCY_CODE', 'AREA_CODE', 'HARM_EVENT_DESC1', 'HARM_EVENT_CODE1',
       'HARM_EVENT_DESC2', 'HARM_EVENT_CODE2', 'RTE_NO', 'ROUTE_TYPE_CODE',
       'RTE_SUFFIX', 'LOG_MILE', 'LOGMILE_DIR_FLAG_DESC', 'LOGMILE_DIR_FLAG',
       'MAINROAD_NAME', 'DISTANCE', 'FEET_MILES_FLAG_DESC', 'FEET_MILES_FLAG',
       'DISTANCE_DIR_FLAG', 'REFERENCE_NO', 'REFERENCE_TYPE_CODE',
       'REFERENCE_SUFFIX', 'REFERENCE_ROAD_NAME', 'LATITUDE', 'LONGITUDE',
       'LOCATION'],
      dtype='object')
Out[3]:
YEAR QUARTER LIGHT_DESC LIGHT_CODE COUNTY_NO MUNI_DESC MUNI_CODE JUNCTION_DESC JUNCTION_CODE COLLISION_TYPE_DESC ... FEET_MILES_FLAG_DESC FEET_MILES_FLAG DISTANCE_DIR_FLAG REFERENCE_NO REFERENCE_TYPE_CODE REFERENCE_SUFFIX REFERENCE_ROAD_NAME LATITUDE LONGITUDE LOCATION
0 2020 Q2 NaN 6.02 24.0 NaN NaN Non Intersection 1.0 Other ... Miles M N NaN NaN NaN NORTH AVE 39.311025 -76.616429 POINT (-76.616429453205 39.311024794431)
1 2017 Q2 Daylight 1.00 24.0 NaN NaN NaN NaN Single Vehicle ... NaN NaN NaN NaN NaN NaN NaN 39.282928 -76.635215 POINT (-76.6352150952347 39.2829284750108)
2 2020 Q2 Daylight 1.00 24.0 NaN NaN Intersection 2.0 Other ... Feet F S NaN NaN NaN WINDSOR AVE 39.312903 -76.651472 POINT (-76.651471912939 39.312903404529)
3 2020 Q2 Daylight 1.00 24.0 NaN NaN NaN 99.0 Same Movement Angle ... NaN U E NaN NaN NaN WASHINGTON ST 39.294944 -76.599329 POINT (-76.599328693204 39.294943770185)
4 2020 Q2 NaN 5.02 24.0 NaN NaN NaN NaN Same Direction Rear End ... NaN NaN NaN NaN NaN NaN NaN 39.296874 -76.595871 POINT (-76.595871121891 39.296873988072)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
135585 2021 Q4 Unknown 99.00 24.0 NaN NaN Non Intersection 1.0 Same Direction Sideswipe ... Feet F E NaN NaN NaN LINDEN AVE. 39.315169 -76.635210 POINT (-76.635209753929 39.315168933901)
135586 2021 Q4 Daylight 1.00 24.0 NaN NaN Intersection 2.0 Single Vehicle ... Feet F N NaN NaN NaN N CALVERT ST 39.291235 -76.612507 POINT (-76.61250671377 39.291234751225)
135587 2021 Q4 Dark Lights On 3.00 24.0 NaN NaN Intersection 2.0 Same Direction Rear End ... Feet F S NaN NaN NaN E FAYETT 39.290660 -76.606670 POINT (-76.60667 39.29066)
135588 2021 Q4 Dark Lights On 3.00 24.0 NaN 999.0 Non Intersection 1.0 Same Direction Rear End Left Turn ... Feet F W 2125.0 MU NaN GARRISON BLVD 39.343406 -76.640647 POINT (-76.640646523492 39.343405723453)
135589 2021 Q4 Dark Lights On 3.00 24.0 NaN NaN Non Intersection 1.0 Same Direction Rear End ... Feet F N NaN NaN NaN CHARLES ST 39.293148 -76.609680 POINT (-76.609679764014 39.293147795494)

135590 rows × 55 columns

The first thing to note is that the dataframe has over 100,000 rows, and for the sake of performance, we cannot display them all on our map. So, instead of displaying all crashes, let's display the crashes for only a certain year.

It is also important to note that the data notes many different features of a crash such as light description, turn description, surface condition, and more.

Let's make sure that each row has a longitude, latitude, and year that is not null. Let's also make sure that the year is a numerical type

In [4]:
print(crash_data["YEAR"].value_counts())
print("---------------")
valid_latitudes = crash_data[pd.notnull(crash_data["LATITUDE"])]["LATITUDE"].count()
print("The number of valid latitudes is " + str(valid_latitudes))
valid_longitudes = crash_data[pd.notnull(crash_data["LONGITUDE"])]["LONGITUDE"].count()
print("The number of valid longitudes is " + str(valid_longitudes))
print("The length of the crash data is " + str(len(crash_data)))
2016    25763
2015    23682
2017    19220
2018    17490
2019    17020
2021    16858
2020    15557
Name: YEAR, dtype: int64
---------------
The number of valid latitudes is 135590
The number of valid longitudes is 135590
The length of the crash data is 135590

We can see that there are no null years and that the year is of type int64. Additionally, the number of rows in the dataframe which have a valid location are 135590 which matches the total number of rows in the dataframe.

Since the year 2020, has the least amount of rows, let's display the crash data for 2020. Now need to find a criteria by which we can separate the 2020 data. This allows us to display the different values in different colors on the map, so we can spot any locational trends that may appear.

We could separate the crashes by whether a signal flag is present near the crash. A signal flag is a flag which alerts drivers of nearby traffic conditions. Some things which could be writen on a signal flag are "Congestion Ahead" or "Construction Zone".

In [5]:
print(len(crash_data[crash_data["YEAR"] == 2020]))
signal_flag_data = crash_data[crash_data["YEAR"] == 2020]["SIGNAL_FLAG_DESC"]
print(signal_flag_data.count())
15557
15557

For the year 2020, there is a signal flag description for each crash

In [6]:
signal_flag_data.value_counts()
Out[6]:
No     9537
Yes    6020
Name: SIGNAL_FLAG_DESC, dtype: int64

By looking at the number of crashes with and without a signal flag, we can see that about 40 percent of the crashes have a signal flag while 60 percent don't. While looking at the map data, we should keep in mind that there are more crashes without a signal flag.

Now let's create a folium map that is centered based on the mean of the location data in the dataframe.

In [7]:
map_osm = folium.Map(location=[crash_data["LATITUDE"].mean(), crash_data["LONGITUDE"].mean()], zoom_start=11, 
                     width = 1000, height = 600)
map_osm
Out[7]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Let's now add all of our points to the map. The crashes in the year 2020 which had a signal flag will be displayed in the color blue while the points in the same year without a signal flag will be displayed in the color red. This allows us to spot any locational trends associated with the presence of a signal flag at a crash

In [8]:
# Created a new dataframe for only the year 2020 and iterated through its rows
filtered_data = crash_data[crash_data["YEAR"] == 2020]
for index, crash in filtered_data.iterrows():
  # The folium points were added as a circle rather than a marker
  # to improve performance 
    if crash["SIGNAL_FLAG_DESC"] == "Yes":
        folium.Circle(location=[crash["LATITUDE"], crash["LONGITUDE"]],
                    color = "blue").add_to(map_osm)        
    elif crash["SIGNAL_FLAG_DESC"] == "No":
        folium.Circle(location=[crash["LATITUDE"], crash["LONGITUDE"]],
                    color = "red").add_to(map_osm)    
   
   
    

Now that all of the desired points have been added, let's display the map.

In [9]:
map_osm
Out[9]:
Make this Notebook Trusted to load map: File -> Trust Notebook

We can immediately notice the clustering of signal flags towards the center of the city. The blue points certainly do extend outwards as well, but not nearly to the same degree as the red points do. The crashes without a signal flag still happen towards the center of the city, but the ones with a signal flag seem way more clustered in that region. Right above South Baltimore, and towards the center of the map, we see a big cluster of blue points.

The crashes without a signal flag seem to dominate the tunnels and bridges towards South Baltimore. I-83 (the highway that travels between Pikesville and Towson) also seems to be primarily red, indicating the lack of signal flags in that area.

This trend can be explained by the large amount of construction and traffic that occurs in a city. A large city like Baltimore would naturally have constant construction since new businesses are constantly moving in. This construction would lead to a signal flag whether it's one placed on the ground, or a construction worker navigating people away from the zone. These would also lead to detours which would create signal flags as well.

Construction also creates more traffic, something which is abundant in Baltimore already. As a result, there may be signal flags about the congestion ahead. Lots of traffic makes it harder for drivers to anticipate unexpected turns and maneuvers because there are a lot more cars to watch out for. This would, in turn, create more crashes as drivers cannot fully be aware of everything that is happening around them.

These crowded areas naturally create more traffic and signal flags, so the clustering of crashes with signal flags towards central Baltimore seems plausible.